我们介绍了DeepNash,这是一种能够学习从头开始播放不完美的信息游戏策略的自主代理,直到人类的专家级别。 Stratego是人工智能(AI)尚未掌握的少数标志性棋盘游戏之一。这个受欢迎的游戏具有$ 10^{535} $节点的巨大游戏树,即,$ 10^{175} $倍的$倍于GO。它具有在不完美的信息下需要决策的其他复杂性,类似于德克萨斯州Hold'em扑克,该扑克的游戏树较小(以$ 10^{164} $节点为单位)。 Stratego中的决策是在许多离散的动作上做出的,而动作与结果之间没有明显的联系。情节很长,在球员获胜之前经常有数百次动作,而Stratego中的情况则不能像扑克中那样轻松地分解成管理大小的子问题。由于这些原因,Stratego几十年来一直是AI领域的巨大挑战,现有的AI方法几乎没有达到业余比赛水平。 Deepnash使用游戏理论,无模型的深钢筋学习方法,而无需搜索,该方法学会通过自我播放来掌握Stratego。 DeepNash的关键组成部分的正则化NASH Dynamics(R-NAD)算法通过直接修改基础多项式学习动力学来收敛到近似NASH平衡,而不是围绕它“循环”。 Deepnash在Stratego中击败了现有的最先进的AI方法,并在Gravon Games平台上获得了年度(2022年)和历史前3名,并与人类专家竞争。
translated by 谷歌翻译
It would be useful for machines to use computers as humans do so that they can aid us in everyday tasks. This is a setting in which there is also the potential to leverage large-scale expert demonstrations and human judgements of interactive behaviour, which are two ingredients that have driven much recent success in AI. Here we investigate the setting of computer control using keyboard and mouse, with goals specified via natural language. Instead of focusing on hand-designed curricula and specialized action spaces, we focus on developing a scalable method centered on reinforcement learning combined with behavioural priors informed by actual human-computer interactions. We achieve state-of-the-art and human-level mean performance across all tasks within the MiniWob++ benchmark, a challenging suite of computer control problems, and find strong evidence of cross-task transfer. These results demonstrate the usefulness of a unified human-agent interface when training machines to use computers. Altogether our results suggest a formula for achieving competency beyond MiniWob++ and towards controlling computers, in general, as a human would.
translated by 谷歌翻译
This dissertation reports some first steps towards a compositional account of active inference and the Bayesian brain. Specifically, we use the tools of contemporary applied category theory to supply functorial semantics for approximate inference. To do so, we define on the `syntactic' side the new notion of Bayesian lens and show that Bayesian updating composes according to the compositional lens pattern. Using Bayesian lenses, and inspired by compositional game theory, we define categories of statistical games and use them to classify various problems of statistical inference. On the `semantic' side, we present a new formalization of general open dynamical systems (particularly: deterministic, stochastic, and random; and discrete- and continuous-time) as certain coalgebras of polynomial functors, which we show collect into monoidal opindexed categories (or, alternatively, into algebras for multicategories of generalized polynomial functors). We use these opindexed categories to define monoidal bicategories of cilia: dynamical systems which control lenses, and which supply the target for our functorial semantics. Accordingly, we construct functors which explain the bidirectional compositional structure of predictive coding neural circuits under the free energy principle, thereby giving a formal mathematical underpinning to the bidirectionality observed in the cortex. Along the way, we explain how to compose rate-coded neural circuits using an algebra for a multicategory of linear circuit diagrams, showing subsequently that this is subsumed by lenses and polynomial functors. Because category theory is unfamiliar to many computational neuroscientists and cognitive scientists, we have made a particular effort to give clear, detailed, and approachable expositions of all the category-theoretic structures and results of which we make use.
translated by 谷歌翻译
Transformers have proved to be very effective for visual recognition tasks. In particular, vision transformers construct compressed global representations through self-attention and learnable class tokens. Multi-resolution transformers have shown recent successes in semantic segmentation but can only capture local interactions in high-resolution feature maps. This paper extends the notion of global tokens to build GLobal Attention Multi-resolution (GLAM) transformers. GLAM is a generic module that can be integrated into most existing transformer backbones. GLAM includes learnable global tokens, which unlike previous methods can model interactions between all image regions, and extracts powerful representations during training. Extensive experiments show that GLAM-Swin or GLAM-Swin-UNet exhibit substantially better performances than their vanilla counterparts on ADE20K and Cityscapes. Moreover, GLAM can be used to segment large 3D medical images, and GLAM-nnFormer achieves new state-of-the-art performance on the BCV dataset.
translated by 谷歌翻译
This white paper lays out a vision of research and development in the field of artificial intelligence for the next decade (and beyond). Its denouement is a cyber-physical ecosystem of natural and synthetic sense-making, in which humans are integral participants$\unicode{x2014}$what we call ''shared intelligence''. This vision is premised on active inference, a formulation of adaptive behavior that can be read as a physics of intelligence, and which inherits from the physics of self-organization. In this context, we understand intelligence as the capacity to accumulate evidence for a generative model of one's sensed world$\unicode{x2014}$also known as self-evidencing. Formally, this corresponds to maximizing (Bayesian) model evidence, via belief updating over several scales: i.e., inference, learning, and model selection. Operationally, this self-evidencing can be realized via (variational) message passing or belief propagation on a factor graph. Crucially, active inference foregrounds an existential imperative of intelligent systems; namely, curiosity or the resolution of uncertainty. This same imperative underwrites belief sharing in ensembles of agents, in which certain aspects (i.e., factors) of each agent's generative world model provide a common ground or frame of reference. Active inference plays a foundational role in this ecology of belief sharing$\unicode{x2014}$leading to a formal account of collective intelligence that rests on shared narratives and goals. We also consider the kinds of communication protocols that must be developed to enable such an ecosystem of intelligences and motivate the development of a shared hyper-spatial modeling language and transaction protocol, as a first$\unicode{x2014}$and key$\unicode{x2014}$step towards such an ecology.
translated by 谷歌翻译
The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
translated by 谷歌翻译
Assigning qualified, unbiased and interested reviewers to paper submissions is vital for maintaining the integrity and quality of the academic publishing system and providing valuable reviews to authors. However, matching thousands of submissions with thousands of potential reviewers within a limited time is a daunting challenge for a conference program committee. Prior efforts based on topic modeling have suffered from losing the specific context that help define the topics in a publication or submission abstract. Moreover, in some cases, topics identified are difficult to interpret. We propose an approach that learns from each abstract published by a potential reviewer the topics studied and the explicit context in which the reviewer studied the topics. Furthermore, we contribute a new dataset for evaluating reviewer matching systems. Our experiments show a significant, consistent improvement in precision when compared with the existing methods. We also use examples to demonstrate why our recommendations are more explainable. The new approach has been deployed successfully at top-tier conferences in the last two years.
translated by 谷歌翻译
社会科学研究中文本数据的使用增加受益于易于访问的数据(例如Twitter)。这种趋势是以研究成本需要敏感但难以分享的数据的成本(例如,访谈数据,警察报告,电子健康记录)。我们使用开源文本匿名软件_textwash_介绍了该僵局的解决方案。本文使用TILD标准介绍了该工具的经验评估:技术评估(工具的准确性?),信息损失评估(匿名过程中丢失了多少信息?)和De-Nomenymisation Test(可以可以使用(可以可以可以使用)测试(可以可以使用匿名测试(可以人类从匿名文本数据中识别个人吗?)。研究结果表明,TextWash的性能类似于最新的实体识别模型,并引入了可忽略的信息损失0.84%。对于De-nonymisation测试,我们任命人类从众包人的描述数据集中对非常著名,半著名和不存在的个人的描述来识别个人。该工具的现实用例的匿名率范围为1.01-2.01%。我们在第二项研究中复制了发现,并得出结论,Textwash成功地删除了潜在的敏感信息,这些信息实际上使人描述实际上是匿名的。
translated by 谷歌翻译
我们使用新的近似推理学说的概念来开发活性推断的组成理论。为了展示此类函子,我们首先使用多项式函数的语言的概括来提供必要类型的组成界面:与结构的多项式索引类别,我们构建了不同的单核生物,我们构建了差异性的差异类别和动态``层次推理系统'',其中近似推理学说具有语义。然后,我们描述``外部参数化''的统计游戏,并使用它们来构建两个在计算神经科学文献中发现的近似推理学说,我们称之为“ laplace”和``hebb-laplace''教义:前者是前者产生动态系统的,这些系统会产生动态系统,这些系统会产生动态系统,这些系统是制作动态系统的。优化高斯模型的后代;后者产生的系统还优化了确定其预测的参数(或“权重”)。
translated by 谷歌翻译
是的 - 这项研究研究了普通位置的损失图像压缩对受试者的种族特征的面部识别算法的影响。我们采用了最近提出的基于种族表型的偏见分析方法,以衡量种族表型类别中不同损失压缩水平的影响。此外,我们确定了色度补充采样和与种族相关的表型之间的关系,以识别表现。先前的工作调查了损失的JPEG压缩算法对当代面部识别性能的影响。但是,这种影响与不同种族相关的截面组以及这种影响的原因存在差距。通过广泛的实验设置,我们证明了常见的损失图像压缩方法对特定种族表型类别(例如较深的肤色(最高34.55 \%))的面部识别性能具有更明显的负面影响。此外,在压缩过程中除去色度补充采样可提高所有受压缩影响的表型类别的错误匹配率(高达15.95 \%),包括较深的肤色,宽阔的鼻子,大嘴唇,大嘴唇和单层眼类别。此外,我们概述了可能归因于这种现象的基本原因的特征,例如JPEG等有损压缩算法。
translated by 谷歌翻译